Revealing Collection Structure through Information Access Interfaces

نویسندگان

  • Marti A. Hearst
  • Jan O. Pedersen
چکیده

In fo rmat ion Access research at Xerox PARC focuses on ampl i f y ing the users' cognitive abil i t ies, rather than t r y i ng to completely automate them. This framework emphasizes the par t ic ipat ion of the user in a cycle of query fo rmu la t ion , presentation of results, followed by query re formulat ion, and so on. Th is framework is intended to help the user i teratively refine a vaguely understood in format ion need. Since the focus is on query repair, the in fo rmat ion presented is typical ly not document descriptions, but rather intermediate in format ion that indicates relationships between the query and the retrieved documents. We have developed in format ion access tools intended to supply some of this funct ional i ty , and describe two of these here. As an i l lus t ra t ion , suppose a user is interested in medical diagnosis software. Assume that in i t ia l ly the user has available a large, unfamil iar in format ion source. In our example, this source is the 2.2 Gigabyte T I P S T E R text col lection [Harman, 1993]. Because the collection is unfami l ia r , the user wi l l be unsure whether it contains relevant in fo rmat ion , and if so, how to access i t . To address this s i tuat ion, we have developed a browsing me thod , called Scatter/Gather [Cut t ing et a/., 1992; 1993], tha t allows a user to rapidly assess the general contents of a very large collection by scanning through a dynamic , hierarchical representation that is mot ivated by a table-of-contents metaphor. In i t ia l ly the system automat ica l ly scatters, or clusters, the collection into a smal l number of document groups, and presents short summaries of the groups to the user. These summaries consist of two types of in format ion: topical t i t les (t i t les of documents close to the cluster centroid) and typical terms ( terms of importance in the cluster). Based on these summaries, the user selects one or more of the groups for fur ther study. The selected groups are gathered) or unioned, together to form a subcollection. The system then applies clustering again to scatter the new subcol lect ion in to a smal l number of document groups, which are again presented to the user. W i t h each successive i te ra t ion the groups become smaller, and therefore more detai led. The user may, at any t ime, switch to a more focused search method. Figure 1 shows a port ion Figure 1: A port ion of a top-level view of the Scatter /Gather algori thm over the T I P S T E R corpus.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Access and Mobility Policy Control at the Network Edge

The fifth generation (5G) system architecture is defined as service-based and the core network functions are described as sets of services accessible through application programming interfaces (API). One of the components of 5G is Multi-access Edge Computing (MEC) which provides the open access to radio network functions through API. Using the mobile edge API third party analytics applications ...

متن کامل

ارزیابی رابط‌های جست و جو در پایگاه‌های پزشکی مبتنی بر شواهد

Introduction: The existence of proper search interfaces in evidence based medicine databases will lead to quick access to evidence based medicine. Given The necessity and their importance, the aim of this study is an evaluation of search interfaces in evidence based medicine databases. Methods: This study was an applied research, Carried through survey method. The study population was 12 evide...

متن کامل

Delviz: exploration of tagged information visualizations

Classification methods such as social tagging provide an easy way to annotate documents with metadata and describe them from various points of view. Frequently used visualization methods like tag clouds offer a limited access to the network of tags and fail at illustrating implicit relationships. Developing user interfaces which reveal these relationships and support the exploration of the resu...

متن کامل

Information Retrieval using Natural Language Interfaces

In Database Management System (DBMS) is collection of interrelated data and set of programs to access and modify the data. In Relational DBMS (RDBMS), data is organized inform of tables. In order to retrieve information from the database, the end user requires the knowledge of database query language such as Structured Query Language (SQL). However, not every user is able to write SQL queries a...

متن کامل

Investigating the Correlation of Access to the Internet/ Computer System and Information Literacy

Background: Formerly in societies, literacy only implied reading and writing skills. Gradually, the industrial 20th century turned into the information era of the 21st century. Accordingly, to survive in the present world, we need to have information literacy. Acquiring information technology (IT) is an indispensable part of literacy. The current research aims to investigate the correlation of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995